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Abstract 

In the theoretical biology framework one fundamental problem is the so-called error catastro- 
phe in Darwinian evolution models. We reexamine Eigen's fundamental equations by mapping 
them into a polymer depinning transition problem in a "genotype" space represented by a uni- 
tary hypercubic lattice {0, l} d . The exact solution of the model shows that error catastrophe 
arises as a direct consequence of the equations involved and confirms some previous qualitative 
results. The physically relevant consequence is that such equations are not adequate to properly 
describe evolution of complex life on the Earth. 

PACS numbers: 87.10.+e, 68.45.Gd 
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An important question in the context of darwinian "natural" selection theory is: how could 
complex life evolve and finally reach the structure we can see nowadays by selecting the fittest 
species among the huge number of different allowed choices? Could we explain the mechanism 
of self-organization (guided evolution) to complex life from basic principles or is it necessary to 
consider some other "external" organizing parameter? The number N of different realizations 
of a given virus DNA chain, made of a very long random sequence of basic units (chemical 
bases), is typically given by iV ~ 10 1000 . Hence time needed by random evolution to "explore" 
all possible choices before reaching the optimum sequence (i.e. complex life) is really enormous. 

Here we consider a simplified model, that is evolution in a genotype space of dimension d with 
a unique master sequence (MS) being the favored one, i.e. the one corresponding to individuals 
with highest fitness. All other sequences are supposed to have the same lower fitness, which, 
for sake of simplicity, we will take as unity. The quasi- species Q can "diffuse" in this genotype 
space with a mutation rate per base t; generally assumed to be very small. A so-called error 
catastrophe arises since increasing the chain length d, even though the master sequence I m has 
highest fitness, it can hardly survive evolution. In other words, we need extremely large fitness 
for I m or, equivalently, exceedingly small mutation rate to keep the MS in a population. 

The first investigation of this problem was achieved by Eigen and coworkers |[| . The aim of 
the present work is to solve the problem exactly and particular attention will be devoted to the 
conditions for occurring of the error catastrophe. 

In Eigen's model natural selection is described by a simple prototype evolution equation. 
The space of configurations, i.e. the genotype space, is constructed from a set X of sequences of 
uniform length comprising d monomeric units of which k classes (chemical bases) can exist. The 
number of different sequences is the cardinality of the set I, and obviously given by N = \2\ = k d . 
In the simplest case (the one we will consider in the following) k = 2 and then sequences are 
made of binary units: Ii = ai {aj = {0, 1}, Vi = 1, • • • , d}. 

One can then introduce a continuous-time master equation for the concentration of individ- 
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uals Xi The main result is that the target of selection is a species defined by the dominant, 
that is the most probable sequence I m (MS) P]~P|]> which is reached after finite time in a self- 
organized way, i.e. without any external fine tuning. Some heuristic arguments show that if the 
excess production rate H of the master sequence A m is too small as compared with those of the 
mutants A^m, then error catastrophe arises ||: no convergence to the I m sequence takes place 
and dynamics is dominated by a random creation and annihilation of all possible individuals in 
the set I. If however such an error catastrophe really occurred in biological evolution, organized 
life couldn't evolve on the Earth. 

Our main goal in this article is the following: we solve exactly the evolution equations, 
by means of a mapping to a polymer localization problem, and prove that error catastrophe 
always occurs in Eigen's model: the ratio a = A m /Ak^ m necessary to self-organize the process 
to the master sequence is exponentially big in the sequence length d. In other words, to localize 
evolution around I m , Eigen's rate equation needs an enormous selective advantage a (actually 
never realizable). 

We first define our system and the space of configurations: let us consider a d-dimensional 
hypercubic unitary lattice Q = {0, mimicking a genotype space. Each side is made of only 
two points representing binary units. Each point of £1 has a one-to-one correspondence to a 
sequence 1{ (i = 1, since the cardinality of I is equal to the number of distinct points of 
f2 (we take k = 2). The discretized time version of the rate equation, in the polymer context, 
describes a depinning transition |?J . 

A polymer, directed along the time axis, moves in a (f-dimensional space, here 17, subjected 
to a contact energy term with energy gain —u < per step of the interface located at the wall. 
At finite temperatures T > the polymer fluctuates in order to increase its configurational 
entropy but large fluctuations are unlikely ||]. 
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The polymer is completely specified by the Hamiltonian 

H L ({h}«) =J2(J h« - h« -u^,,o) , (1) 



i=i 



and the partition function Q 

Z L (x) = ^e X p{-W L ({h}«)/T}, (2) 

{h} 

where is the position of the polymer in $7 at time %. Overhangs are forbidden and RSOS 
condition is imposed. 

In order to solve the problem we consider a transfer matrix approach (see also 0): 



Z L+1 (x) = (l - (a - l)S x 5 ) (j2 tZ L (x + e«) + (1 - dt)Z L {^j , 



(3) 



where we have introduced the unitary vectors = (0, • • • , 1, • • • , 0) as those having a"l" bit in 
the i-th. position. Moreover we have defined the parameters a = exp(ii/T) and t = exp(— J/T). 
In this scheme the mutation rate t € [0, 1] can be thought of as the probability that a given 
point x in 17 jumps to a well defined nearest neighbor x + e^, and 1 — alt is the probability that 
no jumps occur, that is q d = 1 — dt. In the usual notation q is indeed the probability of exact 
replication of one base in the DNA chain. 

We introduce a dual space representation to have periodic boundary conditions in all di- 
rections: Z L (x) = £ k={01}d (-l) xk Z L (k) and Z L (k) = l/2 d £ x={0il}d (-l) x - k Z L (x). The 
summation is over the 2 d possible binary realizations of k and x. In the dual space equation 
(3) takes the form 

z L+1 (k) = ,(k)z L (k) + ^±l J2 «(q)^(q), W 

q={0,l} d 

with s(q) = t J2i=ii~ + 1 — dt. Our goal is then to solve a 2 d -dimensional eigenvalue for the 
dual transfer matrix M. acting on the r.h.s. of eq. (4). After some algebraic manipulation one 
can show that the spectrum of the matrix is given by the 2 d solutions of the following equation: 



a - 1 ^ s(k) 

t^, . e- s(k) 
k={o,i} d v ' 



The search for an analytical solution can be simplified by recalling that in the thermodynamic 
limit L — > oo the free energy density / is dominated by the largest eigenvalue of the spectrum, 
i.e. by the spectral radius of M.. 

We will list below, without proof, a series of exact results; all mathematical details will be 
given elsewhere [[!(]] . The maximum eigenvalue of the transfer matrix Ai is always nondegener- 
ate, as a consequence of the Frobenius-Perron theorem, and has a corresponding positive right 
eigenvector. We can use M to calculate, as a first approximation, the bounds for the spectral 



radius p(M) = £ by means of some theorems on positive matrices [11 



We note that, strictly speaking, one could have a phase transition for polymer localization, 
i.e. a discontinuity in the derivatives of the thermodynamic potentials, only in the limit d — > oo. 
For finite d the situation is less clear. At any finite dimension d the total number of accessible 
sites is finite and equal to 2 d . As a consequence our polymer never wanders at infinity even in 
the thermodynamic limit t — ► oo. However if the pinning strength is not big enough the polymer 
is "rough" in the sense that is can visit all accessible configuration space up to the maximum size 
allowed for that fixed d. On the other hand, in the "pinned" phase, the transversal localization 
length i within which the polymer is confined to the origin is independent on the linear size 
L and is always finite (even at d — > oo). The two different behaviors take place at a given 
characteristic value u cr a of the pinning potential which will be our definition of criticality. The 
following statements are equivalents: in the unbounded state one has e — > 1 + , vanishing free 
energy per unit length / and constant components Z{%) of the positive eigenvector associated 
to £. The opposite stands in the localized phase. 

Now we can turn our attention to equation (5). The idea is to transform it into a simpler 
formula for e by means of a Feynman integral representation. The result is that the maximum 
eigenvalue is given by the only real solution of the following implicit equation: 

: / o °° e-^~ l+dt > (cosh(ut)) d du = 2 F X (-d; 1; ^ + 1; ~V (6) 



a- 1 

We note that the integral diverges iff e = 1. Therefore if the attractive potential at the origin 
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is omitted (a = 1), the maximum eigenvalue must be unitary, too. Then the free energy / 
vanishes and we attain a delocalized phase, as expected. In the above formula 2-^1 (— a; b; c; d) 
is the usual hypergeometric series of negative argument —a [O], 

We define I(d;e,t) the integral in (6). The basic results are listed below: In the hypothesis: 
a G (1, 00); e £ (1, 00); t £ [0, 1]; d & (0, 00) and dt £ [0, 1] then: 

1. e(a) is a convex non-decreasing function of a (strictly convex for d finite). I(d;£,t) is a 
strictly convex, decreasing function of d. 

2. For large a the function e(a; d, t) is linear in a: e ~ (1 — c£i)a, (a S> 1). 

The shape of £l(d;£,t) is showed in Fig. (2) as a function of d. Parameters {t,£} are fixed in 
the physical range. 

A detailed analysis of the asymptotic development for I(d; e, t) at large d needs particular 
attention, since we should properly take into account the condition dt < 1. This means that 
both the limits d — > 00 and t — > must be performed simultaneously in the development in such 
a way that a = dt be constant. The result of the calculation is: 

a _ I , (jgf , 3(dt) 4 e /n 
o-l e-l + dt d{£-l + dtf ^ d 2 {£-l + dtf + U 3 /' 

This implicit algebraic equation can be solved for the maximum e and the result is compared 
with the exact calculation performed by numerically finding the spectral radius of Ai for a given 
set of parameters {d,t,a}, (see Fig.(l)). 

The shape of the eigenvector corresponding to e is relevant from the point of view of the 
depinning transition. It is represented by the sum of its components m(d; a, e,t) = J2i^{i)- 
One can prove that 

m(d;a,£,t) = — - — - — - = — - — J(d;e,t) -1 . (8) 
£ — 1 a £ — 1 

As a direct consequence we have that (see eq.(9) below) lim a ^ 1 + m = 2 d and lim^oo m = 1. 
Fig.(l) shows the shape of m comparing the numerical result obtained by the transfer matrix 
and the analytical one from the asymptotic development truncated at order 0(l/d s ). The 
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coincidence is very good. Our depinning transition can be easily studied in terms of fi = log(ra). 
In the unbounded state the polymer wanders in all the accessible space of Q. and then m reaches 
its maximum value, while if a is very high jjl converges towards 0. 

If one asks for the critical pinning a c necessary to localize the polymer on the origin, we 
should fix the parameters d and t, with the constraint dt < 1, necessary to preserve the prob- 
abilistic interpretation, and search for the maximum allowed a associated with an eigenvalue e 
"sufficiently " close to 1. We specify this statement by considering as values close to 1 those 
which differ from the unity for a vanishing quantity in the limit d — > oo. 

This definition can be justified, and made rigorous, by noting that for a — > 1 + eq.(5) is 
dominated by only one term in the sum and one gets the result (here e stands for the maximum 
eigenvalue of the spectrum of M) 

or e ~ 1 H —, = 1 + dd- (9) 



a-1 ~" 2 d e-l' ' l + a(2 d -l) 

Then we can properly define a c = sup a6 ( l oo ){a| e < 1 + 5d}. 

The conclusion is that, if a is below a c , e converges exponentially to 1 + in the limit d — ► oo. 
Now we will prove the main physical result of this article, namely that the threshold is the 
critical pinning a c necessary to localize the polymer and that we have, at criticality, Vd: 

1 + Sd , log a , . 

dc = T~ ^ dc ~ - r 2-, 10 

1 — dt log q 

where 5d is a function going to as 0(2~ d ), (see eq.(9)). 

The proof is rather simple if we look at the graphical interpretation of eq.(6), see also Fig. (2). 
For a given set {d, t, a} in the physical range, the nondegenerate e is found by intersecting the 
curve C = el(d;a,t) with the horizontal line a/ {a — 1) = A = const.. As we showed above, £ 
asymptotically converges to K = e/(e — 1 + dt) for large d and then to 1/dt in the extreme 
situation e — > 1 + . If a is too big, namely A < K, for a fixed e, then no solutions can be found 
since A is below £. In that case a solution always exists but for a bigger e, necessary to lower 
K below A. As a consequence, the critical a c following our definition, can be found by asking 
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the maximum allowed a compatible with a solution of the form e = 1 + <5<f • The answer is now 
obvious and it is given by eq. (10). 

Fig. (3) shows the critical dimension d c as a function of the pinning a for two values of t. 
The coincidence between formula (10) and the numerical results is remarkable. 

Conclusion: in this Letter we have reexamined the evolution equations introduced by Eigen 
and coworkers in order to mimic Darwinian natural selection in biological evolution. A particle 
diffusing on the O space and subjected to an attractive wall localized at the origin can be viewed, 
in the evolutionary context, as a reproduction process in the genotype space. The mutation 
rate t and the excess production rate A± for a given DNA sequence are easily mapped into 
other physical quantities for the polymer localization problem. Our main result is that the 
so-called error catastrophe problem naturally arises as a consequence of the model introduced. 
In other words the exact solution of Eigen equations shows that even though a given sequence 
has the highest fitness, random natural selection can never bring evolution towards the MS 
since one needs an extremely large fitness for the master sequence itself (exponentially big in 
the sequence length d) in order to keep it surviving. This result suggests that Eigen's equations 
are intrinsically not adequate if one is interested in the mechanisms which explain the origin 
and the development of complex life on the Earth (characterized by individuals of very high 
fitness). We therefore argue that a more realistic explanation of this complex phenomenology 
is still lacking. 
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Figure captions 

Fig. 1 

Big picture: maximum eigenvalue of the transfer matrix A4 plotted vs. the pinning strength 
a. Numerical data: full line; analytical result (up to order 0(l/d 3 )): circles. The dashed lines 
are the bounds for e obtained from the transfer matrix. Small picture: log[m(a)] vs. a . In all 
cases d = 100, t = .003. Numerical data: full line. Analytical result: circles. 

Fig. 2 

Shape of I(d;e,t) and of a/ (a — 1) vs. d (see text). The dashed line gives the asymptotic limit 
of / for large dimensions d at e — > 1 + . 

Fig. 3 

Critical dimension d c vs. pinning strength a for two distinct values of t. Lower curve: t = 10 -2 ; 
upper curve: t = 10~ 3 . Full lines represent the function d c = t _1 (l — 1/a) (see text). Circles 
and squares: numerical data from the transfer matrix. 
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